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Abstract 

This paper describes the concept, technical realisation and validation of a largely data-driven 
method to model events with Z —* rr decays. In Z —» /j/j events selected from proton-proton 
collision data recorded at yfs = 8 TeV with the ATLAS experiment at the LHC in 2012, the 
Z decay muons are replaced by r leptons from simulated Z —> tt decays at the level of 
reconstructed tracks and calorimeter cells. The r lepton kinematics are derived from the 
kinematics of the original muons. Thus, only the well-understood decays of the Z boson 
and t leptons as well as the detector response to the r decay products are obtained from 
simulation. All other aspects of the event, such as the Z boson and jet kinematics as well as 
effects from multiple interactions, are given by the actual data. This so-called r-embedding 
method is particularly relevant for Higgs boson searches and analyses in tt final states, 
where Z —> tt decays constitute a large irreducible background that cannot be obtained 
directly from data control samples. In this paper, the relevant concepts are discussed based 
on the implementation used in the ATLAS Standard Model H —> tt analysis of the full 
datataset recorded during 2011 and 2012. 


© 2015 CERN for the benefit of the ATLAS Collaboration. 

Reproduction of this article or parts of it is allowed as specified in the CC-BY-3.0 license. 




1 Introduction 


The experimental sensitivity of searches for (and eventually studies of) Higgs bosons in tt final states at 
the LHC is driven by analyses of intricate event signatures that are not restricted to the Higgs candidate 
decay products. For example, the missing transverse momentum enters the reconstruction of the di-r 
invariant mass m TT , which is a key quantity in these analyses. The shape of the reconstructed m TT distri¬ 
bution also depends on the boost of the rr system and thus on the presence and kinematics of additional 
jets in the event. In addition, details of the final-state topology are used to define event categories, for ex¬ 
ample based on vector-boson fusion topologies characterised by two high-energy jets with large rapidity 
separation, and recent ATLAS analyses [1] also combine them into multivariate classifiers to extract the 
Higgs boson signal. 

In these analyses, events with Z/y* —> tt decays constitute a large irreducible background, and thus a 
reliable and detailed model of these processes is a critical ingredient. In view of the complexity of the 
relevant event properties it is highly desirable to rely as little as possible on simulation; moreover, it has 
been shown in dedicated measurements [2-4] that existing Monte Carlo simulations of Z+jets events need 
to be corrected in order to model the data. Ideally the model would be obtained directly from the collision 
data. However, due to background contributions, e.g. from events with other objects misidentified as r 
decays, it is difficult to select a sufficiently pure Z/y* —> rr sample from the data, and doing so without 
also including Higgs boson decays to r lepton pairs is conceptually impossible. 

Z/y* —> tt events can still be modelled in a largely data-driven way by using Z/y* —> pp events as a 
starting point. 1 Except for effects due to the difference in muon and r lepton masses, the two processes 
are kinematically identical assuming lepton universality. In particular the kinematics of the Z boson and 
additional jets in the event are independent of the Z decay mode. By requiring two isolated, high-energy 
muons with opposite charge, Z —> pp decays can be selected from the data with high efficiency and 
purity, and due to the small muon mass and correspondingly small Higgs-muon coupling, the H —> pp 
contamination is expected to be negligible for all practical purposes. The detector response to the Z decay 
muons can be removed from the data events and replaced by corresponding information for r leptons from 
simulated Z —> tt decays, where the r kinematics arc derived from the kinematics of the original muons 
(taking into account both the T-p mass difference and the r-r spin correlation). This substitution results 
in a Z —> tt event model where only the well-understood decays of the Z boson and r leptons and the 
detector response to the r lepton decay products are obtained from the simulation. All other aspects of the 
event - including, for example, the kinematics of the Z boson and additional jets, the underlying event as 
well as effects from multiple interactions - are directly taken from the data. The simulated and collision- 
data information are combined based on reconstructed tracks and calorimeter cells, followed by a re¬ 
reconstruction of the resulting hybrid events. In the following, this technique is referred to as embedding 
of simulated Z —> tt decays in Z —> pp data events (or, in short, r embedding). It has been used in all 
H —> tt searches by ATLAS [5-8] to date, including the most recent analysis [1] establishing evidence 
for this decay. Corresponding CMS analyses [9, 10] have applied a similar technique. In addition, the 
method was adapted to single-r processes for use in the analysis of W — * rv T decays [11] and searches 
for charged Higgs bosons [12, 13]. 

This paper describes the concept, technical realisation and validation of the r embedding corresponding 
to the implementation used in the ATLAS H —> tt analysis [1] of the full pp collision dataset recorded 
during 2011 and 2012 at centre-of-mass energies of yfs = 1 TeV and \(v = 8 TeV, respectively. The 


1 For simplicity, these processes are hereafter denoted by Z —> rr and Z —> pp, respectively. 
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method is valid for all r lepton decay channels. However, here the discussion and examples focus on final 
states where one of the r leptons decays leptonically and the other one hadronically, also referred to below 
as the lepton-hadron tt decay mode. This corresponds to the most sensitive H —> tt channel and tests 
the embedding of both the leptonic and hadronic r decays. After a description of the ATLAS detector and 
the final-state reconstruction algorithms in Section 2, Section 3 provides an overview of the relevant event 
samples and selections. Section 4 outlines the concept and implementation of the r-embedding method. 
Studies to validate the procedure and associated systematic uncertainties are discussed in Section 5. A 
summary and conclusions are given in Section 6. 


2 Experimental setup 

2.1 The ATLAS detector 

The ATLAS detector [14] at the LHC covers nearly the entire solid angle around the collision point. It 
consists of an inner tracking detector surrounded by a thin superconducting solenoid, electromagnetic 
and hadronic calorimeters, and a muon spectrometer incorporating three large superconducting toroid 
magnets, each with eight coils. The inner-detector system (ID) is immersed in a 2 T axial magnetic field 
and provides charged-particle tracking in the pseudorapidity range 2 \i]\ < 2.5. The high-granularity silicon 
pixel detector covers the vertex region and typically provides three measurements per track. It is followed 
by the silicon microstrip tracker which usually provides four two-dimensional measurement points per 
track. These silicon detectors are complemented by the transition radiation tracker, which enables radially 
extended track reconstruction up to \i]\ = 2.0. The transition radiation tracker also provides electron 
identification information based on the fraction of hits (typically 30 in total) above a higher energy- 
deposit threshold corresponding to transition radiation. The calorimeter system covers the pseudorapidity 
range |//| < 4.9. Within the region |//| < 3.2, electromagnetic calorimetry is provided by barrel and end 
cap high-granularity lead/liquid-argon (LAr) electromagnetic calorimeters, with an additional thin LAr 
presampler covering |//| < 1.8, to correct for energy loss in material between the interaction vertex and 
the calorimeters. Hadronic calorimetry is provided by the steel/scintillating-tile calorimeter, segmented 
into three barrel structures within |//| < 1.7, and two copper/LAr hadronic endcap calorimeters. The solid 
angle coverage is completed with forward copper/LAr and tungstcn/LAr calorimeter modules optimised 
for electromagnetic and hadronic measurements respectively. The muon spectrometer (MS) comprises 
separate trigger and high-precision tracking chambers measuring the deflection of muons in a magnetic 
field generated by superconducting air-core toroids. The precision chamber system covers the region 
|//| < 2.7 with three layers of monitored drift tubes, complemented by cathode strip chambers in the 
forward region, where the background is highest. The muon trigger system covers the range \i]\ < 2.4 
with resistive plate chambers in the barrel, and thin gap chambers in the endcap regions. A three-level 
trigger system is used to select interesting events [15]. The Level-1 trigger is implemented in hardware 
and uses a subset of detector information to reduce the event rate to a design value of at most 75 kHz. This 
is followed by two software-based trigger levels which together reduce the event rate to about 400 Hz. 


2 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the centre of the detector 
and the z-axis along the beam pipe. The x-axis points from the IP to the centre of the LHC ring, and the z/-axis points 
upwards. Cylindrical coordinates (r, <p) are used in the transverse plane, 0 being the azimuthal angle around the beam pipe. 
The pseudorapidity is defined in terms of the polar angle 8 as r/ = - lntan(0/2). Angular distance is measured in units of 
A R= V(A n) 2 + (A 0) 2 . 
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2.2 Final-state reconstruction 


Muon candidates are reconstructed using an algorithm [16] that combines information from the ID and 
the MS. The distance between the z-position of the point of closest approach of the muon inner-detector 
track to the beam-line and the /-coordinate of the primary vertex 3 is required to be less than 1 cm. This 
requirement reduces the contamination due to cosmic-ray muons and beam-induced backgrounds. Muon 
quality criteria such as inner-detector hit requirements are applied in order to achieve a precise meas¬ 
urement of the muon momentum and reduce the misidentilication rate. Muons are required to have a 
momentum in the transverse plane pj > 10 GeV and a pseudorapidity of \r/\ < 2.5. Isolation requirements 
on close-by tracks and energy depositions in the calorimeter are applied in order to distinguish prompt 
muons from other candidates originating e.g. from hadronic showers. 

Electron candidates are reconstructed from energy clusters in the electromagnetic calorimeters matched 
to a track in the ID. They are required to have a transverse energy, Ej = E sin 0, greater than 15 GeV, 
be within the pseudorapidity range |//| < 2.47 and satisfy the medium shower shape and track selection 
criteria defined in Ref. [17]. Candidates found in the transition region between the end-cap and barrel 
calorimeters (1.37 < |//| < 1.52) are not considered. Like for the muons, isolation criteria are applied to 
suppress non-prompt candidates originating e.g. from hadronic showers. 

Jets are reconstructed using the anti-A:, jet clustering algorithm [18, 19] with a radius parameter R = 0.4, 
taking topological energy clusters [20] in the calorimeters as inputs. Jet energies are corrected for the 
contribution of multiple interactions using a technique based on jet area [21] and are calibrated using 
p-\- and //-dependent correction factors determined from simulation and data [22-24]. Jets arc required 
to be reconstructed in the range |//| < 4.5 and to have p-\ > 30 GeV. To reduce the contamination by jets 
from additional pp interactions in the same or neighbouring bunch crossings (pile-up), tracks originating 
from the primary vertex must contribute a large fraction of the p-\ when summing the scalar pj of all 
tracks associated with the jet. This jet vertex fraction (JVF) is required to be at least 50% for jets with 
Pt < 50 GeV and |//| < 2.4. Jets with no associated tracks are retained. 

Hadronically decaying r leptons are reconstructed starting from clusters of energy depositions in the 
electromagnetic and hadronic calorimeters. The Thad 4 reconstruction is seeded by the anti-A:, jet-finding 
algorithm with a radius parameter R = 0.4. Tracks with pj > 1 GeV within a cone of size A R = 0.2 
around the cluster barycentre arc assigned to the Thad candidate. Its momentum is calculated from the 
topological energy clusters associated with the jet seed after applying a dedicated Th a d energy calibration. 
The Thad charge is determined from the sum of the charges of the associated tracks. The rejection of jets is 
provided in a separate identification step using discriminating variables based on tracks with pj > 1 GeV 
and the energy deposited in calorimeter cells found in the core region (A R < 0.2) and in the region 
0.2 < A R < 0.4 around the Thad candidate’s direction. Such discriminating variables are combined in 
a boosted decision tree and three working points, labelled tight, medium and loose [25], are defined, 
corresponding to different Thad identification efficiency values. In the studies presented in this paper, Thad 
candidates with pj > 20 GeV and \r/\ < 2.47 are used. The Th a d candidates arc required to have one or 
three reconstructed tracks with a total charge of ± 1 and to satisfy the medium criteria, which provide an 
identification efficiency of the order of 55-60%. Dedicated criteria [25] to separate Thad candidates from 
misidentified electrons are also applied, with a selection efficiency for true hadronic t decays of 95%. 
The probability to misidentify a jet with p-\ > 20 GeV as a Thad candidate is typically 1-2%. 

3 The primary vertex is the proton-proton vertex candidate with the highest sum of the squared transverse momenta of all 
associated tracks. 

4 In the following, the Thad symbol always refers to the visible decay products of the hadronic r decay. 
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Following their reconstruction, candidate leptons, hadronically decaying r leptons and jets may point 
to the same energy deposits in the calorimeters. Two reconstructed objects are considered to overlap if 
then - separation A R is smaller than 0.2. Such overlaps are removed by selecting objects in the following 
order of priority (from highest to lowest): muons, electrons, Th a d, and jet candidates. Objects with lower 
priority are discarded when overlapping with another object with higher priority. The leptons that arc 
considered in overlap removal with Th a d candidates need only to satisfy looser criteria than those defined 
above, to reduce misidentified Th a d candidates from leptons. The pj threshold of muons considered in 
overlap removal is also lowered to 4 GeV. 

The missing transverse momentum (with magnitude is™ ss ) is reconstructed using the energy deposits in 
calorimeter cells calibrated according to the reconstructed physics objects ( e , y, Th a d, jets and p) with 
which they are associated [26]. The transverse momenta of reconstructed muons are included in the E™ ss 
calculation, with the energy deposited by these muons in the calorimeters taken into account. The energy 
from calorimeter cells not associated with any physics object is scaled according to a soft-term vertex 
fraction and also included in the Zs™ ss calculation. This fraction is the ratio of the summed scalar p-\ 
of tracks from the primary vertex not matched with objects to the summed scalar pj of all tracks in the 
event also not matched to objects. This method allows a better reconstruction of the £'™ 1SS in high pile-up 
conditions [27], 


3 Data samples and event selection 

3.1 Event samples 

The studies presented in this paper are based on data recorded with ATLAS during the 2012 LF1C run at a 
proton-proton centre-of-mass energy yfs = 8 TeV. After data-quality requirements, these correspond to 
an integrated luminosity of 20.3 fb . 

For the validation of the r-embedding procedure, samples of Monte Carlo simulated (MC) events with 
Z —* pp and Z —» tt decays are used as input or as reference, respectively. Simulated events are 
produced with the Alpgen [28] event generator employing the MLM matching scheme [29] between 
the hard process (calculated with leading-order matrix elements for up to five partons) and the parton 
shower. The Cteq6L1 parameterisation of the parton distribution functions [30] is used and the Pythia8 
program [31] provides the modelling of the parton shower, the hadronisation and the underlying event. 
A full simulation of the ATLAS detector response [32] using the Geant4 program [33] is performed. In 
addition, events from minimum-bias interactions are simulated using the AU2 [34] tuning of Pythia8. 
They are overlaid with the simulated signal and background events according to the luminosity profile of 
the recorded data. The contributions from these pile-up interactions are simulated both within the same 
bunch crossing as the hard-scattering process and in neighbouring bunch crossings. Finally, the resulting 
simulated events are processed through the same reconstruction programs as the data. 

In the simulation of the Z —» tt decays that are embedded into the Z —> pp input events as described 
in Section 4.1, the r decay products are generated using Tauola [35], and Photos [36] provides photon 
radiation from charged leptons. 
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From these datasets, the following event samples are derived: 

• Replacing the muons from recorded Z —> pp data events with r leptons from simulated Z —» rr 
decays as described in Section 4.1 results in r-embedded data, which are the standard event samples 
used in physics analyses to model Z —> tt processes. 

• p-embedded data are obtained by using simulated Z —* pp decays instead of Z —» rr decays to 
replace the muons in the Z —> pp input data events. These make it possible to study systematic 
effects of the embedding procedure in comparatively simple final states. While the r-embedded 
samples are based on the full 2012 dataset, the /r-embedded validation is restricted to a subset 
corresponding to an integrated luminosity of 1.0 fb~ 1 . 

• Using simulated instead of data Z —* pp events as input yields p- or r-embedded MC samples. 
These can then be compared to direct simulations of these processes. 

• In order to study effects originating from the reconstruction of the input muons as well as of final- 
state radiation, alternative embedded MC samples are produced, where the kinematics of the em¬ 
bedded objects are derived from the generator-level muons instead of the reconstructed momenta. 
In the following, this is referred to as generator-seeded embedding, as opposed to the standard 
detector-seeded procedure. 


3.2 Event selection 

For the studies presented below, events are selected from one or several of the samples listed in Section 3.1 
using one of the following sets of criteria. In all cases, standard quality criteria are applied to ensure a 
fully operational detector and well-reconstructed events. 

• Z —> pp selection: 

Collision events are selected using a combined dimuon trigger, with p-\ thresholds of 18 GeV for 
the leading muon and 8 GeV for the sub-leading muon, or a single-muon trigger ( pj(ji) > 24 GeV). 
Only events with at least two good-quality muons (cf. Section 2.2) arc accepted. The leading (sub¬ 
leading) muon is required to fulfil pj(p) > 20 (15) GeV. Both muons must be isolated in the ID, 
which is ensured by requiring the scalar sum of other track transverse momenta in an isolation cone 
of size A R = 0.4 to be smaller than 20% of the muon transverse momentum (l(pj,0.4)/pj(p) < 
0.2). Only events containing at least one such opposite-charge muon pair with an invariant mass 
> 40 GeV are considered. 

• Z —■> tt selection: 

The rr selection is adopted from the H —» rr lepton-hadron-channel analysis documented in 
Ref. [1]. Both in simulated and recorded data samples, single-electron or single-muon triggers 
with a lepton pj threshold of 24 GeV are used to select events, in which exactly one r candidate 
with p'r(T'had) > 20 GeV fulfilling the medium identification criteria and either exactly one elec¬ 
tron or exactly one muon with p^{e Ip) > 26 GeV are required. In addition to a track isolation of 
I(PT,0A)/pT(e/p) < 0.06, a calorimeter isolation of 7(£Y,0.2 )/pt(£/f) < 0-06 is applied to the 
leptons, i.e. the scalar sum of the transverse energy deposited in calorimeter cells within A R < 0.2 
not associated with the candidate is calculated and required to be smaller than 6% of the total 
transverse momentum of the muon or the total transverse energy of the electron. 
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• Boosted Z-enriched selection: 

The H —> rr lepton-hadron-channel analysis documented in Ref. [1] considers two signal event 
categories: a VBF category enriched in vector-boson fusion Higgs production events and a boosted 
category targeting mainly events with high-px Higgs bosons produced via gluon-gluon fusion. For 
the boosted category, a corresponding Z-enriched control sample is defined, which is adopted here 
to illustrate the r-embedding performance within physics analyses, see Section 5.2. This sample 
includes events that pass the Z —> tt selection described above but fail the VBF category selection 
detailed in Ref. [1], In addition, the pj of the Z candidate reconstructed from the vector sum of 
momenta of the visible r decay products and the missing transverse momentum is required to 
exceed 100 GeV. In order to further enhance the fraction of Z events, W decays are suppressed by 
considering only events with a transverse mass 5 mj < 40 GeV. Potential contamination by Higgs 
signal events is avoided by requiring the invariant mass m“ MC of the rr pair not to exceed 110 GeV. 
This mass is reconstructed from the visible r decay products and the missing transverse momentum 
with the so-called missing mass calculator (MMC) [37]. 


4 Embedding 


In the following, the r embedding method is described in more detail. Special properties of the resulting 
event samples and embedding-specific systematic uncertainties are also discussed. 

4.1 Procedure 

The r embedding procedure can be separated into five consecutive steps as depicted in the flowchart 
shown in Figure 1 . After selecting the Z —> pp input event, a corresponding Z —» rr decay is generated 
and passed to a full detector simulation. The muons in the input event are then replaced by the r leptons 
from the simulated Z decay. As a final step, a re-reconstruction of the resulting hybrid event is necessary, 
since it would be insufficient to combine the event information at the level of fully reconstructed physics 
objects. For example, the additional calorimeter energy depositions from pile-up events can change the 
results of the E™ ss reconstruction, and the identification of hadronic r decays is particularly sensitive 
to the details of the calorimeter response. In contrast, corresponding effects on the reconstruction of 
charged-particle tracks from the individual tracking detector hits are expected to be negligible for the 
data-taking conditions and the phase space relevant to the Higgs analyses of the 8 TeV data. Therefore, the 
embedding procedure is performed at the level of calorimeter cells and reconstructed tracks, as described 
in more detail below. 


1. Selection of the Z —> pp input events from the collision data: 

Input events for the embedding procedure are obtained according to the Z —> pp selection described 
in Section 3.2. For events with more than two muon candidates, all possible oppositely-charged 
pairs with a common vertex are formed, and the muon pair with m m closest to the Z boson mass is 
chosen as the Z —> pp candidate decay products. 


5 nij = y2 • PT {{) ■ £“ iss • (i - cos A0); A0 is the azimuthal angle between the directions of the electron or muon C and the 
missing transverse momentum vector. 
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Figure 1: Flowchart of the embedding procedure. 

2. Generation of a corresponding Z —* tt decay: 

a) Substitution of muons with r leptons and subsequent r decays: 

From the selected muons in a collision data event, the four-momenta of a corresponding Z —» 
tt decay are derived: the production vertex of the r leptons is set to the common production 
vertex of the reconstructed muon pair, and each muon is then replaced by a r lepton. The r 
four-momenta are rescaled according to 



thus keeping the energy E fJ unchanged but replacing the muon mass with the r mass m T . 

The resulting Z — > tt kinematics as obtained from the Z —> /j/u events is processed with 
Tauola and Photos. Flere, the decay of each r lepton pair by Tauola takes into account the 
polarisation and spin correlations of the r leptons. The Z polarisation, however, depends 
on the parton configuration of the initial state, which is not directly available here. Dur¬ 
ing the generation of the decays, Tauola therefore assumes an average polarisation of zero 
and assigns a random helicity of ± 1 to each Z boson. The actual non-zero average Z po¬ 
larisation is correctly accounted for by applying event weights obtained with the TauSpinner 
program [38, 39], which infers the most probable configuration of the initial partons and thus 
the helicity of the Z boson from the decay product kinematics. 

b) Kinematic filter for the decay products: 

If the generation of Z —> tt decays were purely based on the probability distributions of the 
actual decay kinematics, a large fraction of the embedded Z —* tt decay products would fail 
the selection criteria of typical physics analyses. In particular the leptonic r decays would 
often end up below the relevant transverse momentum thresholds. Therefore, a kinematic 
r decay filter is implemented at generator level in order to increase the effective number of 
r-embedded Z —» p/i events entering the tt selection. Instead of generating only one tt 
decay for each Z collision data event, the Tauola program is used to produce 1000 differ¬ 
ent kinematic configurations of the decay products according to the appropriate probability 
distributions. Only the first of the 1000 decay configurations in which the generated trans¬ 
verse momenta of the visible decay products ( e/n/T^ad ) exceed certain threshold values is 
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Figure 2: Generator-level distributions of (a) the Th a d transverse momentum and (b) the summed transverse mo¬ 
menta of all neutrinos for r-embedded events without filter (red open circles), after applying the filter (blue squares) 
and after applying the filter with filter weights (black triangles) as described in the text. The lower panels show the 
relative deviation of the corrected distributions from the unfiltered ones. The red shaded error band and the black 
error bars correspond to the statistical uncertainty from the unfiltered and filtered events, respectively. 


then selected for further processing. The thresholds can be chosen based on the final rr ana¬ 
lysis selection; for this paper, as for the H —» rr lepton-hadron-channel analysis presented in 
Ref. [1], they were set to prCniad) > 15 GeV, pj(e) > 18 GeV and pj(p) > 15 GeV, i.e. safely 
below the rr analysis selection thresholds of pifThad) > 20 GeV and pj(e/p) > 26 GeV. The 
selection of rr decays according to these thresholds introduces kinematic biases as shown 
in Figure 2(a) for the visible momentum of hadronic r decays and in Figure 2(b) for the 
vector sum of the neutrino transverse momenta, which corresponds to the expected missing 
transverse momentum in the event. Based on all 1000 rr decays generated for the given Z kin¬ 
ematics, the probability to accept a random rr decay configuration is evaluated for each event. 
These probabilities correspond to event-by-event filter efficiencies and are thus propagated as 
weights, which correct the kinematic biases as demonstrated in Figure 2. 

3. Detector simulation of the Z —» rr decay: 

The result corresponds to a standard event generator output for a Z —> rr decay without any 
underlying-event effects, but otherwise based on the standard ATLAS MC configuration [32], which 
is then handed over to the full ATLAS detector simulation and reconstruction. In order to avoid 
double counting in the later merging with the corresponding collision data event, the calorimeter 
noise is switched off during the simulation. In the following, the output of this simulation step is 
referred to as a mini event. 


4. Merging of data and simulated event: 

In order to replace the muons in the selected Z —> pp data events with the corresponding sim¬ 
ulated r leptons, all tracks associated with the original muons are removed from the data event. 
The calorimeter cells associated with the muons are subtracted according to the following proce¬ 
dure: a Z —> pp decay with the same kinematics as the original event (and without the underlying 
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event or pile-up interactions) is simulated. The calorimeter cell energies in the simulated event are 
subtracted from the data event. 

All calorimeter cell energies from the simulated mini event are then added to the corresponding 
data cell energies, and all tracks are copied into the corresponding event. This inserts the pure 
Z —» tt decay into the data environment, keeping the event properties as close to data conditions 
as possible. 

5. Reconstruction of the embedded events: 

Starting from the modified cell energies and the merged set of tracks, the hybrid Z —> tt events 
are submitted to the ATLAS event reconstruction for collision data, which recreates the complete 
physics object final state by re-running all standard event reconstruction algorithms except for the 
track reconstruction. 


The procedure is further illustrated by Figure 3, showing example displays of a Z —> pp input event, a 
correspondingly simulated Z —> tt mini event (with one r lepton decaying into a muon and the other one 
hadronically), and the resulting embedded hybrid event. 



Embedded event ATLAS 



- 

U\ 


■II J P : V \ 

(c) 

Figure 3: Displays of (a) a Z —> / j/j candidate event selected from the collision data, (b) the corresponding simulated 
Z —> tt mini event and (c) embedded hybrid event. Here, one of the r leptons decays into a muon and the other one 
hadronically. 


4.2 Special properties of the r-embedded event samples 

While in most respects the r-embedded samples can be treated within physics analyses as standard colli¬ 
sion data, there are a few special properties to be considered: 
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• The Z —» pp input data are subject to trigger and offline selection efficiencies, which particularly 
affect analyses with low pj selection thresholds for the t decay products. To account for these 
efficiencies, correction factors as a function of the transverse momenta and pseudorapidities of the 
input muons are extracted according to Refs. [15, 16] and applied to the r-embedded samples. 

• As discussed in Section 4.1, instead of recreating the charged-particle tracks from the tracking 
detector hits, the embedding procedure is performed with reconstructed tracks. As a side effect, 
the trigger response for the r-embedded events is not available, since it would require the hit-level 
information. Therefore, any effect of the analysis-specific trigger selection needs to be evaluated 
and corrected for, e.g. through a parameterisation of the trigger efficiency measured in data. For 
the validation in Section 5.2, such corrections were derived corresponding to the Z —» rr selection 
described in Section 3.2 and applied to the r-embedded samples. 

• The selected Z —> pp input data sample is of high purity, but small contaminations from other 
processes, e.g. tt production, might be enhanced to relevant levels by selection requirements applied 
during physics analyses. Double counting of these contributions must hence be avoided when 
combining the r-embedded events with other samples to construct a complete background model. 
In recent analyses, e.g. in Ref. [1], this is achieved by rejecting events from simulated samples 
of other background processes if they produce two r leptons that fulfil the kinematic Z —» pp 
input selection at generator level. The corresponding rr final states arc already included in the 
r-embedded sample as obtained from the corresponding pp background contamination from other 
processes. 

• In deriving the kinematics of the embedded r leptons from the reconstructed muons selected from 
the ATLAS data, the true kinematics of the Z decay are folded with the resolution of the muon 
reconstruction. Final-state radiation (FSR) from the input muons can also modify the kinemat¬ 
ics of the embedded objects. Both effects are unavoidable and inseparable in the embedding of 
data events, but they can be studied separately using simulated samples and are found to be small 
(cf. Section 5.2). 

• While the r-embedded samples constitute a largely data-driven model of Z —» rr events, the r 
leptons and their decay products are based on simulation, and systematic uncertainties associated 
with the MC description of r decays and the corresponding detector response need to be considered 
within physics analyses. Further documentation of these systematic uncertainties, e.g. for the 
hadronic r decays, can be found in Ref. [25], 

• The size of the r-embedded samples is naturally limited by the available number of Z —> pp data 
events. Compared to a corresponding selection of rr final states from the data, this number is 
effectively enhanced by applying the kinematic filter described in Section 4.1. 


4.3 Systematic uncertainties 

Two different sources of systematic uncertainty are considered, which are motivated by the technical 
implementation of the embedding method and are thus estimated from the following variations of the 
embedding procedure: 

1. The isolation requirement applied in the selection of the Z —» pp input events can affect the environ¬ 
ment of the embedded objects in the final event. It is thus varied in two alternative selections: the 
nominal isolation criterion of I(pj, 0A)/pj(p) < 0.2 is either completely removed or tightened to 
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I(Pt,0.4)/< 0.06 and I(Ej, 0.2)/pj(p) < 0.04. These variations mainly affect the properties 
of the embedded objects, but they additionally provide an estimate of the background contamination 
from pp final states with non-prompt muons in the r-embedded samples. 

2. The subtraction of cell energy associated with the muon is based on the simulated calorimeter re¬ 
sponse, which can be subject to large uncertainties. Therefore, the simulated energy in each cell 
is scaled by ±20% before the subtraction from the data event. The size of this variation was moti¬ 
vated by the results of comparisons of r-embedded collision-data and simulated events to standard 
Z —» rr MC samples. 

For all embedded event samples listed in Section 3.1, the different variations are produced in parallel. 
The resulting datasets are then used to derive and validate the embedding-related systematic uncertain¬ 
ties. Different selection efficiencies, e.g. due to the modified isolation requirements, are absorbed by nor¬ 
malising the systematic variations to the default sample. For both estimates of systematic uncertainties, 
the remaining shape uncertainties are later symmetrised to the larger of the two variations, in particular 
compensating for the non-symmetric isolation criteria. Figure 4 illustrates the effect on the distributions 
of two example quantities after the Z —> rr selection as described in Section 3.2. 

Modifications of the input muon kinematics due to final-state radiation or the detector resolution, which 
could be considered as somewhat more fundamental sources of systematic effects, do not directly enter the 
above definitions of embedding-related uncertainties. Their impact is, however, expected to be correlated 
with the variations of the cell energy subtraction and the muon isolation and in fact turns out to be small 
in comparison, as demonstrated in Section 5.2. 
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Figure 4: Distributions of (a) the calorimeter isolation of the selected lepton and (b) the tt invariant mass obtained 
with the MMC, illustrating the effects of systematic variations as described in the text: scaling the subtracted 
cell energy by +20% and applying tight / no isolation requirements in the Z —> pp selection. The ratios of the 
distributions before and after specific systematic variations are included as well: the upper ratio plot shows the 
effect of no (tight) isolation in blue (green), in the lower one the effect of scaling the subtracted cell energy by 
+20% (-20%) is illustrated by triangles pointing upwards (downwards). In both plots the red lines correspond to 
the nominal embedded sample. 
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5 Validation 


A careful validation of the embedding procedure is performed based on different combinations of the 
event samples described in Section 3.1. The results of these studies arc discussed in the following. All 
distributions are normalised to unit area unless stated otherwise. 


5.1 Z -* nn -based validation 

The first set of studies is based on muon-embedded data and MC samples, where the original muons 
are removed and replaced with the decay products of correspondingly simulated Z —» pp decays. In 
this case, events with Z —» pp decays and jets constitute both the input and the output samples and thus 
distributions of any quantity for the same events before and after the embedding can be compared directly. 
Such comparisons provide a powerful validation of most aspects of the procedure by testing for biases 
introduced in the removal of tracks and cells associated with the input muons, the stand-alone simulation 
of the Z mini event or the creation and re-reconstruction of the embedded hybrid event. None of the 
trigger and selection efficiency corrections discussed in Section 4.2 are applied here. 

In order to investigate possible distortions of the detector response close to the input muons. Figure 5 com¬ 
pares the distributions of the absolute ( I(Ej , 0.2)) and relative ( I(Ej, 0.2)/ pj) muon calorimeter isolation 
as defined in Section 3.2, before and after p embedding. Here, the displayed errors do not include the 
isolation systematic uncertainty, which is obtained by varying an explicit cut on the relative calorimeter 
isolation, cf. Section 4.3, and is thus not well defined in these specific comparisons. The observed changes 
in the distributions, which indicate fluctuations in the estimation of the calorimeter energy associated to 



(a) 



l(E T ,0.2)(n)/p T (n) 

(b) 


Figure 5: Comparison of Z —* pp data events before (blue) and after p embedding (black points) in terms of (a) the 
calorimeter isolation and (b) the relative calorimeter isolation in a cone A R = 0.2, each including ratios showing 
the relative differences of the distributions after p embedding. The grey hatched error band corresponds to the cell 
energy systematic uncertainties of the //-embedded events, as described in Section 4.3. 
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Figure 6: Comparison of Z —> pp data events before (blue) and after p embedding (black points): (a) transverse 
momentum of the leading muon and (b) dimuon mass, each including ratios showing the relative differences of the 
distributions after p embedding. The light (dark) grey hatched error band corresponds to the sum in quadrature of 
cell + isolation (cell only) systematic uncertainties of the p-embedded events. 


the input muons based on an independent simulation discussed in Section 4.1, are not fully covered by 
the remaining embedding-specific uncertainties. However, this mainly concerns negative isolation val¬ 
ues, which are far away from standard isolation requirements as also used for the studies presented in this 
paper, and the region with I{Ej, 0.2)//?t > 0.04, where the undisplayed isolation uncertainty becomes 
very large by construction. In corresponding comparisons, the kinematics of additional jets in the event 
are found to be unaffected by the embedding procedure. 

For quantities directly related to the muon four-momenta, most changes are found to be within the un¬ 
certainties; for example, Figure 6(a) shows the transverse momentum of the leading muon. In some 
cases, however, larger effects are observed, in particular for the dimuon invariant mass as depicted in Fig¬ 
ure 6(b); small differences are also found at the low end of the distributions of the transverse momentum 
of the dimuon system and of the missing transverse momentum. Such differences are actually expected 
since the kinematics of the embedded events are based on reconstructed input muons and thus are poten¬ 
tially modified by the detector resolution and final-state radiation (FSR), as explained in Section 4.2. This 
is investigated further by using generator-seeded embedded samples, where simulated Z —» pp events 
are used as input and the kinematics of the embedded objects is derived from the generator-level muon 
momenta instead of the reconstructed information, cf. Section 3.1, thus removing FSR and muon recon¬ 
struction effects. This indeed improves the agreement in the muon-related distributions shown for the 
leading muon pj and the dimuon mass in Figure 7. While these simulation-based studies confirm the 
source of the differences in Figure 6, muon reconstruction and FSR effects unavoidably enter the embed¬ 
ding of data events. For the eventual applications of r embedding, however, these differences turn out to 
be negligible as demonstrated in the next section. 
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Figure 7: Comparison of Z —* jiji MC events (blue) and generator-seeded p embedding (black points): (a) trans¬ 
verse momentum of the leading muon and (b) dimuon mass, each including ratios showing the relative differences 
of the distributions after generator-seeded p embedding. The light (dark) grey hatched error band corresponds to 
the sum in quadrature of cell + isolation (cell only) systematic uncertainties and the statistical uncertainties of the 
p-em bedded events. 


5.2 Z —> TT-based validation 

The Z —» pp-based results presented above already provide confidence that the technical implementation 
of the embedding procedure is working correctly. Nevertheless, direct comparisons of rr final states must 
also be performed in order to conclusively validate the modelling of Z —> rr events provided by the final 
r-embedded samples. Since it is difficult to obtain a sufficiently pure Z —» rr reference sample from 
the collision data, the validation is mainly based on comparisons of r-embedded Z —> pp MC events 
to standard Z —> tt MC samples. Still, comparisons of selected Z —» rr collision data to a combined 
background model including r-embedded data are also provided in the last part of this section. 


Input muon radiation and reconstruction effects 

The embedding procedure includes two effects related to the input muons that are unavoidable by con¬ 
struction: the resolution of the reconstructed muon momenta used to derive the kinematics of the em¬ 
bedded mini event and FSR from the input muons. In order to judge if the resolution effects observed 
in Section 5.1 are significant for the eventual r embedding, Figure 8 compares the distributions of the 
r decay lepton transverse momentum and of the invariant mass of the visible rr decay products, m™, 
for generator- and detector-seeded r embedding. These comparisons demonstrate that the uncorrected 
resolution and final-state radiation of the input muons are negligible in the case of reconstructed rr final 
states, for which the mass resolution is dominated by the neutrinos produced in the r decay. 
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Figure 8: Comparison of generator-seeded (gen.-s.), in blue, and detector-seeded (det.-s.), as black points, r- 
embedded Z —> /j/j MC events: (a) transverse momentum of the leading lepton and (b) invariant mass of the visible 
tt decay products, each including ratios showing the relative differences of the distributions from detector-seeded 
t embedding. The blue error band in the ratio plots corresponds to the statistical uncertainties of the generator- 
seeded events, and the black error bars are the statistical uncertainties associated with the detector-seeded embedded 
events. The light (dark) grey hatched error band corresponds to the sum in quadrature of cell + isolation (cell only) 
systematic uncertainties and the statistical uncertainties of the detector-seeded r-embedded events. 


Comparison of r-embedded Z -> pp MC samples with standard Z -» tt MC 

In contrast to a data-to-data comparison of tt final states, which necessarily includes contaminations from 
other background processes, the t embedding of simulated Z —» /j/u events and subsequent comparison 
to standard Z —» tt MC samples provides a well-defined way to further study the method at the tt level. 
Here, as opposed to the studies presented in Section 5.1, the two compared distributions are obtained from 
statistically independent event samples. Also, the corrections discussed in Section 4.2, including those 
related to the selection of the Z —> nn events used as input for the embedding procedure and to the trigger 
selection of the t decay products, now need to be applied. The combined effect of these corrections is 
shown in Figure 9 for the distributions of two quantities closely related to their source: the pseudorapid¬ 
ity and the transverse momentum of the t decay lepton. The mismodelling of the pseudorapidity before 
corrections, cf. Figure 9(a), is due to detector acceptance differences between the input muons and em¬ 
bedded t objects. While the corrections have a visible effect here, their impact is found to be very small 
for the lepton p-\- shown in Figure 9(b) and also for any other of the investigated quantities. Even after 
corrections, the modelling of the pseudorapidity is not perfect but, as demonstrated below, this has no 
impact on observables relevant for physics analyses. 

Further examples of such comparisons, from here on omitting the uncorrected distributions, are collected 
in Figure 10 and Figure 11. Figures 10(a) and 10(b) show the distributions of two of the input quantities 
for the hadronic t identification: the central energy fraction, which is the ratio of the transverse energy 
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Figure 9: Comparison of r-embedded Z —> /j/j MC events (black points) with Z —> rr MC events (blue) for 
(a) the pseudorapidity and (b) the transverse momentum of the r decay lepton, each including ratios showing the 
relative differences of the r-embedded distributions. In addition, the red squares show the distributions obtained 
from the r-embedded MC sample before applying the embedding-specific corrections. The blue error band in 
the ratio plots corresponds to the statistical uncertainties of the Z —> rr MC sample. The black error bars are 
the statistical uncertainties associated with the corrected r-embedded events. The light (dark) grey hatched error 
band corresponds to the sum in quadrature of cell + isolation (cell only) systematic uncertainties and the statistical 
uncertainties of the corrected r-embedded events. 


deposited within A R< 0.1 and A R< 0.2 around the r candidate direction, and the leading-track momentum 
fraction, i.e. the transverse momentum of the highest-/t>| charged particle divided by the calorimetric 
transverse energy within A R< 0.2 [25]. Agreement of the distributions within statistical and embedding- 
related systematic uncertainties indicates that the detector response to embedded r leptons does not differ 
significantly from the standard Z —» rr MC samples. This is further confirmed by the fact that the 
r identification efficiency is found to agree for r-embedded and standard MC samples for all working 
points defined in Ref. [25] within uncertainties. Agreement is also observed for the kinematics of the Z 
decay products, as demonstrated for the Thad P'\ and m” s in Figures 10(c) and 10(d). 

Figures 11(a) and 11(b) compare the distributions of the missing transverse momentum, arising from the 
simulated r decay neutrinos and reconstruction effects, and of m^; MC . Again, no significant differences 
are observed and the same conclusions are reached for jet-related quantities, such as the leading-jet p-\ 
and the pseudorapidity separation of the two leading jets shown in Figures 11(c) and 11(d) . 

Thus, the r-embedded Z —> pp MC events and standard Z —> tt MC events are found to agree in all 
distributions identified to be relevant for physics analyses within the statistical and embedding-related 
systematic uncertainties described in Section 4.3. These comparisons include effects from the modifica¬ 
tion of the input muon kinematics due to final-state radiation and resolution and thus confirm that such 
effects are also covered by the current r-embedding uncertainties. 
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Figure 10: Comparison of r-embedded Z —> /j/j MC events (black points) with Z —* tt MC events (blue): (a) 
central energy fraction, (b) leading-track momentum fraction for three-prong hadronic r decays, (c) Th a d transverse 
momentum and (d) mass of the visible tt decay products, each including ratios showing the relative differences of 
the T-embedded distributions. The blue error band in the ratio plots corresponds to the statistical uncertainties of 
the Z —» tt MC sample, and the black error bars are the statistical uncertainties associated with the T-embedded 
events. The light (dark) grey hatched error band corresponds to the sum in quadrature of cell + isolation (cell only) 
systematic uncertainties and the statistical uncertainties of the T-embedded events. 
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Figure 11: Comparison of r-embedded Z —> /j/j MC events (black points) with Z —* tt MC events (blue): (a) 
missing transverse momentum, (b) rr invariant mass obtained with the MMC, (c) transverse momentum of the 
leading jet and (d) pseudorapidity difference for the two leading jets, each including ratios showing the relative 
differences of the r-embedded distributions. The blue error band in the ratio plots corresponds to the statistical 
uncertainties of the Z —> tt MC sample, and the black error bars are the statistical uncertainties associated with 
the r-embedded events. The light (dark) grey hatched error band corresponds to the sum in quadrature of cell + 
isolation (cell only) systematic uncertainties and the statistical uncertainties of the r-embedded events. 
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Performance within physics analyses 

In a final step, the r-embedded Z —> /j/j collision data events are used as part of a combined background 
model and compared to data in the boosted Z-enriched control region defined in Section 3.2. Due to 
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Figure 12: Comparison of data with the combined background model for example observables in the boosted 
Z-enriched control region: (a) Th a d transverse momentum, (b) invariant mass of the visible tt decay products 
(c) missing transverse momentum and (d) the tt invariant mass obtained with the MMC, each including ratios 
showing the relative differences of the data to the total background estimate. The background contributions from 
other processes and the systematic uncertainties are estimated as described in Ref. [1], 
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significant contributions from other background processes, this is not a clean, stand-alone validation of the 
embedding method but involves other background estimation procedures, performed exactly as in Ref. [1]. 
Since the selection of Z —» pp data events used as input for the embedding procedure includes a cut on 
the invariant mass m m > 40 GeV, low mass Drell-Yan processes with rr final states are not modelled 
via the embedding technique. Instead, these contributions are separately estimated from simulated event 
samples. A few example comparisons are given in Figures 12 and 13. In those distributions the embedded 
samples are normalised to data in a dedicated region as described in Ref. [1]. The combined background 
distributions, dominated by the embedding-based Z —> tt model, are found to provide a good description 
of the ATLAS data within the uncertainties, which here also include other relevant uncertainties related 
to the estimation of the other background contributions as described in Ref. [1]. 
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Figure 13: Comparison of data with the combined background model for example observables in the boosted Z- 
enriched control region: transverse momentum of (a) the Z boson and (b) the leading jet, each including ratios 
showing the relative differences of the data to the total background estimate. The background contributions from 
other processes and the systematic uncertainties are estimated as described in Ref. [1], 


6 Summary and conclusions 

This paper presented the motivation, concept and technical implementation of a r-embedding method, 
which models events with Z —> tt decays and possibly additional jets in a largely data-driven way. 
In Z —» pp events selected from pp collision data recorded with the ATLAS experiment during the 
LHC Runl, tracks and calorimeter cell energies associated with the Z decay muons are replaced by the 
corresponding tracks and energy depositions of the r leptons from simulated Z —> tt decays. For each 
event, the r kinematics are derived from the original Z —> pp decay, so that their correlations with 
other event properties such as additional jets and the reconstructed missing transverse momentum are 
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preserved in the resulting hybrid Z —> rr events. Systematic uncertainties arc estimated by varying the 
muon isolation requirement and the subtracted energy depositions associated with the muons. Extensive 
validation studies were performed using both the pp and rr final states, presented here only for the 
example where one r lepton decays into an electron or muon and the other hadronically. The pp-based 
results demonstrate that the procedure successfully replaces objects in the data events without affecting 
other event properties. Comparing r-embedded Z —> pp MC events with standard Z —> tt MC, agreement 
was found for distributions of all quantities relevant to current physics analyses within the combined 
statistical and embedding-related systematic uncertainties. Other conceptual limitations of the method 
related to the input of reconstructed muon kinematics are found to introduce only small effects compared 
to the uncertainties estimated from variations of the method. For Higgs analyses in rr final states, which 
exploit intricate signatures of additional jets and their correlation with the rr decay kinematics, the r- 
embedded data thus provide a reliable model of the irreducible background from events with Z —> tt 
decays and jets. 
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